Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources
نویسندگان
چکیده
Training a POS tagging model with crosslingual transfer learning usually requires linguistic knowledge and resources about the relation between the source language and the target language. In this paper, we introduce a cross-lingual transfer learning model for POS tagging without ancillary resources such as parallel corpora. The proposed cross-lingual model utilizes a common BLSTM that enables knowledge transfer from other languages, and private BLSTMs for language-specific representations. The cross-lingual model is trained with language-adversarial training and bidirectional language modeling as auxiliary objectives to better represent language-general information while not losing the information about a specific target language. Evaluating on POS datasets from 14 languages in the Universal Dependencies corpus, we show that the proposed transfer learning model improves the POS tagging performance of the target languages without exploiting any linguistic knowledge between the source language and the target language.
منابع مشابه
Empirical Gaussian priors for cross-lingual transfer learning
Sequence model learning algorithms typically maximize log-likelihood minus the norm of the model (or minimize Hamming loss + norm). In cross-lingual part-ofspeech (POS) tagging, our target language training data consists of sequences of sentences with word-by-word labels projected from translations in k languages for which we have labeled data, via word alignments. Our training data is therefor...
متن کاملLearning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection
Cross lingual projection of linguistic annotation suffers from many sources of bias and noise, leading to unreliable annotations that cannot be used directly. In this paper, we introduce a novel approach to sequence tagging that learns to correct the errors from cross-lingual projection using an explicit debiasing layer. This is framed as joint learning over two corpora, one tagged with gold st...
متن کاملCross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian
Because of the small size of Romanian corpora, the performance of a PoS tagger or a dependency parser trained with the standard supervised methods fall far short from the performance achieved in most languages. That is why, we apply state-of-the-art methods for cross-lingual transfer on Romanian tagging and parsing, from English and several Romance languages. We compare the performance with mon...
متن کاملIncreasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging
Bilingual corpora offer a promising bridge between resource-rich and resource-poor languages, enabling the development of natural language processing systems for the latter. English is often selected as the resource-rich language, but another choice might give better performance. In this paper, we consider the task of unsupervised cross-lingual POS tagging, and construct a model that predicts t...
متن کاملPredicting Linguistic Structure with Incomplete and Cross-Lingual Supervision
Täckström, O. 2013. Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision. Acta Universitatis Upsaliensis. Studia Linguistica Upsaliensia 14. xii+215 pp. Uppsala. ISBN 978-91-554-8631-0. Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the ling...
متن کامل